kernel cca
RKUM: An R Package for Robust Kernel Unsupervised Methods
RKUM is an R package developed for implementing robust kernel-based unsupervised methods. It provides functions for estimating the robust kernel covariance operator (CO) and the robust kernel cross-covariance operator (CCO) using generalized loss functions instead of the conventional quadratic loss. These operators form the foundation of robust kernel learning and enable reliable analysis under contaminated or noisy data conditions. The package includes implementations of robust kernel canonical correlation analysis (Kernel CCA), as well as the influence function (IF) for both standard and multiple kernel CCA frameworks. The influence function quantifies sensitivity and helps detect influential or outlying observations across two-view and multi-view datasets. Experiments using synthesized two-view and multi-view data demonstrate that the IF of the standard kernel CCA effectively identifies outliers, while the robust kernel methods implemented in RKUM exhibit reduced sensitivity to contamination. Overall, RKUM provides an efficient and extensible platform for robust kernel-based analysis in high-dimensional data applications.
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > New York (0.04)
- North America > United States > Massachusetts > Middlesex County > Reading (0.04)
- (6 more...)
Cross-Domain Matching for Bag-of-Words Data via Kernel Embeddings of Latent Distributions
Yuya Yoshikawa, Tomoharu Iwata, Hiroshi Sawada, Takeshi Yamada
We propose a kernel-based method for finding matching between instances across different domains, such as multilingual documents and images with annotations. Each instance is assumed to be represented as a multiset of features, e.g., a bag-of-words representation for documents. The major difficulty in finding cross-domain relationships is that the similarity between instances in different domains cannot be directly measured. To overcome this difficulty, the proposed method embeds all the features of different domains in a shared latent space, and regards each instance as a distribution of its own features in the shared latent space. To represent the distributions efficiently and nonparametrically, we employ the framework of the kernel embeddings of distributions. The embedding is estimated so as to minimize the difference between distributions of paired instances while keeping unpaired instances apart. In our experiments, we show that the proposed method can achieve high performance on finding correspondence between multi-lingual Wikipedia articles, between documents and tags, and between images and tags.
Cross-Domain Matching for Bag-of-Words Data via Kernel Embeddings of Latent Distributions
We propose a kernel-based method for finding matching between instances across different domains, such as multilingual documents and images with annotations. Each instance is assumed to be represented as a multiset of features, e.g., a bag-ofwords representation for documents. The major difficulty in finding cross-domain relationships is that the similarity between instances in different domains cannot be directly measured. To overcome this difficulty, the proposed method embeds all the features of different domains in a shared latent space, and regards each instance as a distribution of its own features in the shared latent space. To represent the distributions efficiently and nonparametrically, we employ the framework of the kernel embeddings of distributions. The embedding is estimated so as to minimize the difference between distributions of paired instances while keeping unpaired instances apart. In our experiments, we show that the proposed method can achieve high performance on finding correspondence between multi-lingual Wikipedia articles, between documents and tags, and between images and tags.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Communications > Social Media (0.88)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)
Kernel canonical correlation analysis approximates operators for the detection of coherent structures in dynamical data
Klus, Stefan, Husic, Brooke E., Mollenhauer, Mattes
We illustrate relationships between classical kernel-based dimensionality reduction techniques and eigendecompositions of empirical estimates of reproducing kernel Hilbert space (RKHS) operators associated with dynamical systems. In particular, we show that kernel canonical correlation analysis (CCA) can be interpreted in terms of kernel transfer operators and that coherent sets of particle trajectories can be computed by applying kernel CCA to Lagrangian data. We demonstrate the efficiency of this approach with several examples, namely the well-known Bickley jet, ocean drifter data, and a molecular dynamics problem with a time-dependent potential. Furthermore, we propose a straightforward generalization of dynamic mode decomposition (DMD) called coherent mode decomposition (CMD).
Gene Shaving using influence function of a kernel method
Alam, Md. Ashad, Shahjama, Mohammad, Rahman, Md. Ferdush
Identifying significant subsets of the genes, gene shaving is an essential and challenging issue for biomedical research for a huge number of genes and the complex nature of biological networks,. Since positive definite kernel based methods on genomic information can improve the prediction of diseases, in this paper we proposed a new method, "kernel gene shaving (kernel canonical correlation analysis (kernel CCA) based gene shaving). This problem is addressed using the influence function of the kernel CCA. To investigate the performance of the proposed method in a comparison of three popular gene selection methods (T-test, SAM and LIMMA), we were used extensive simulated and real microarray gene expression datasets. The performance measures AUC was computed for each of the methods. The achievement of the proposed method has improved than the three well-known gene selection methods. In real data analysis, the proposed method identified a subsets of $210$ genes out of $2000$ genes. The network of these genes has significantly more interactions than expected, which indicates that they may function in a concerted effort on colon cancer.
- Asia > Middle East > Jordan (0.04)
- Europe > Germany > Berlin (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Asia > Bangladesh > Rangpur Division > Rangpur District > Rangpur (0.04)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.68)
Influence Function and Robust Variant of Kernel Canonical Correlation Analysis
Alam, Md. Ashad, Fukumizu, Kenji, Wang, Yu-Ping
Many unsupervised kernel methods rely on the estimation of the kernel covariance operator (kernel CO) or kernel cross-covariance operator (kernel CCO). Both kernel CO and kernel CCO are sensitive to contaminated data, even when bounded positive definite kernels are used. To the best of our knowledge, there are few well-founded robust kernel methods for statistical unsupervised learning. In addition, while the influence function (IF) of an estimator can characterize its robustness, asymptotic properties and standard error, the IF of a standard kernel canonical correlation analysis (standard kernel CCA) has not been derived yet. To fill this gap, we first propose a robust kernel covariance operator (robust kernel CO) and a robust kernel cross-covariance operator (robust kernel CCO) based on a generalized loss function instead of the quadratic loss function. Second, we derive the IF for robust kernel CCO and standard kernel CCA. Using the IF of the standard kernel CCA, we can detect influential observations from two sets of data. Finally, we propose a method based on the robust kernel CO and the robust kernel CCO, called {\bf robust kernel CCA}, which is less sensitive to noise than the standard kernel CCA. The introduced principles can also be applied to many other kernel methods involving kernel CO or kernel CCO. Our experiments on synthesized data and imaging genetics analysis demonstrate that the proposed IF of standard kernel CCA can identify outliers. It is also seen that the proposed robust kernel CCA method performs better for ideal and contaminated data than the standard kernel CCA.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (8 more...)
- Health & Medicine > Therapeutic Area > Oncology (0.67)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Kernel Methods (0.75)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.67)
Sparse Kernel Canonical Correlation Analysis via $\ell_1$-regularization
Zhang, Xiaowei, Chu, Delin, Liao, Li-Zhi, Ng, Michael K.
Canonical correlation analysis (CCA) is a multivariate statistical technique for finding the linear relationship between two sets of variables. The kernel generalization of CCA named kernel CCA has been proposed to find nonlinear relations between datasets. Despite their wide usage, they have one common limitation that is the lack of sparsity in their solution. In this paper, we consider sparse kernel CCA and propose a novel sparse kernel CCA algorithm (SKCCA). Our algorithm is based on a relationship between kernel CCA and least squares. Sparsity of the dual transformations is introduced by penalizing the $\ell_{1}$-norm of dual vectors. Experiments demonstrate that our algorithm not only performs well in computing sparse dual transformations but also can alleviate the over-fitting problem of kernel CCA.
- North America > Canada (0.04)
- Asia > Singapore (0.04)
- North America > United States > Ohio (0.04)
- (4 more...)
Learning Schizophrenia Imaging Genetics Data Via Multiple Kernel Canonical Correlation Analysis
Richfield, Owen, Alam, Md. Ashad, Calhoun, Vince, Wang, Yu-Ping
Kernel and Multiple Kernel Canonical Correlation Analysis (CCA) are employed to classify schizophrenic and healthy patients based on their SNPs, DNA Methylation and fMRI data. Kernel and Multiple Kernel CCA are popular methods for finding nonlinear correlations between high-dimensional datasets. Data was gathered from 183 patients, 79 with schizophrenia and 104 healthy controls. Kernel and Multiple Kernel CCA represent new avenues for studying schizophrenia, because, to our knowledge, these methods have not been used on these data before. Classification is performed via k-means clustering on the kernel matrix outputs of the Kernel and Multiple Kernel CCA algorithm. Accuracies of the Kernel and Multiple Kernel CCA classification are compared to that of the regularized linear CCA algorithm classification, and are found to be significantly more accurate. Both algorithms demonstrate maximal accuracies when the combination of DNA methylation and fMRI data are used, and experience lower accuracies when the SNP data are incorporated.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- Asia > Japan (0.04)
- Asia > Bangladesh (0.04)
Identifying Outliers using Influence Function of Multiple Kernel Canonical Correlation Analysis
Imaging genetic research has essentially focused on discovering unique and co-association effects, but typically ignoring to identify outliers or atypical objects in genetic as well as non-genetics variables. Identifying significant outliers is an essential and challenging issue for imaging genetics and multiple sources data analysis. Therefore, we need to examine for transcription errors of identified outliers. First, we address the influence function (IF) of kernel mean element, kernel covariance operator, kernel cross-covariance operator, kernel canonical correlation analysis (kernel CCA) and multiple kernel CCA. Second, we propose an IF of multiple kernel CCA, which can be applied for more than two datasets. Third, we propose a visualization method to detect influential observations of multiple sources of data based on the IF of kernel CCA and multiple kernel CCA. Finally, the proposed methods are capable of analyzing outliers of subjects usually found in biomedical applications, in which the number of dimension is large. To examine the outliers, we use the stem-and-leaf display. Experiments on both synthesized and imaging genetics data (e.g., SNP, fMRI, and DNA methylation) demonstrate that the proposed visualization can be applied effectively.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Bangladesh (0.04)
Gene-Gene association for Imaging Genetics Data using Robust Kernel Canonical Correlation Analysis
Alam, Md ashad, Komori, Osamu, Wang, Yu-Ping
In genome-wide interaction studies, to detect gene-gene interactions, most methods are divided into two folds: single nucleotide polymorphisms (SNP) based and gene-based methods. Basically, the methods based on the gene are more effective than the methods based on a single SNP. Recent years, while the kernel canonical correlation analysis (Classical kernel CCA) based U statistic (KCCU) has proposed to detect the nonlinear relationship between genes. To estimate the variance in KCCU, they have used resampling based methods which are highly computationally intensive. In addition, classical kernel CCA is not robust to contaminated data. We, therefore, first discuss robust kernel mean element, the robust kernel covariance, and cross-covariance operators. Second, we propose a method based on influence function to estimate the variance of the KCCU. Third, we propose a nonparametric robust KCCU method based on robust kernel CCA, which is designed for contaminated data and less sensitive to noise than classical kernel CCA. Finally, we investigate the proposed methods to synthesized data and imaging genetic data set. Based on gene ontology and pathway analysis, the synthesized and genetics analysis demonstrate that the proposed robust method shows the superior performance of the state-of-the-art methods.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > Bangladesh (0.04)